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Abstract: Statistical inference of the fundamental parameters of supersymmetric theories is a 
challenging and active endeavor. Several sophisticated algorithms have been employed to this end. 
While Markov-Chain Monte Carlo (MCMC) and nested sampling techniques are geared towards 
Bayesian inference, they have also been used to estimate frequentist confidence intervals based 
on the profile likelihood ratio. We investigate the performance and appropriate configuration of 
MultiNest, a nested sampling based algorithm, when used for profile likelihood-based analyses 
both on toy models and on the parameter space of the Constrained MSSM. We find that while 
the standard configuration previously used in the literarture is appropriate for an accurate recon- 
struction of the Bayesian posterior, the profile likelihood is poorly approximated. We identify a 
more appropriate MultiNest configuration for profile likelihood analyses, which gives an ex- 
cellent exploration of the profile likelihood (albeit at a larger computational cost), including the 
identification of the global maximum likelihood value. We conclude that with the appropriate con- 
figuration MultiNest is a suitable tool for profile likelihood studies, indicating previous claims 
to the contrary are not well founded. 
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1. Introduction 

The results of the first searches for supersymmetry (SUSY) at the Large Hadron Collider (LHC) 
have recently been released [1]. The field is eager to switch from a mode of excluding model 
parameters to an exciting and challenging era in which we are estimating the underlying parame- 
ters of a new fundamental theory. Within the context of SUSY, the technology has evolved from 
likelihood scans over low-dimensional subspaces of SUSY models [2-5] to higher-dimensional 
Bayesian analysis using Markov Chain Monte Carlo (MCMC) [6-8]. More recently, the nested 
sampling [9, 10] algorithm has been used by several authors for the study of SUSY models [11- 
20]. Although Bayesian techniques Like MCMC and MultiNest have been designed specifically 
to explore the Bayesian posterior distributions, they have also been used to obtain (approximate) 
profile likelihoods [14, 21, 22]. 

MultiNest [23, 24], a publicly available implementation of the nested sampling algorithm, 
has been shown to reduce the computational cost of performing Bayesian analysis typically by two 
orders of magnitude as compared with basic MCMC techniques. MultiNest has been integrated 
in the SuperBayeS code' for fast and efficient exploration of SUSY models. Recently, it has 
been demonstrated (at least for a limited region of the CMSSM parameter space around a specific 
benchmark point) that neural networks techniques can be used to reduce the computational efforts 
for these parameter scans by an additional factor of ~ 10^ [25]. 

For highly non-Gaussian problems like supersymmetric parameter determination, inference 
can depend strongly on whether one chooses to work with the posterior distribution (Bayesian) or 

'Available from: www . superbayes . org 
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profile likelihood (frequentist) [14, 21, 22]. There is a growing consensus that both the posterior 
and the profile likelihood ought to be explored in order to obtain a fuller picture of the statistical 
constraints from present-day and future data. This begs the question, which we address in this 
paper, of the algorithmic solutions available to reliably explore both the posterior and the profile 
likelihood in the context of SUSY phenomenology. 

Recently, a genetic algorithm (GA) based method was developed in [26] specifically to obtain 
the profile likelihoods for CMSSM parameters and the resultant distributions were then compared 
with the profile likelihoods obtained with MultiNest, run in the standard configuration com- 
monly used in the literature. The authors found that MultiNest missed several high likelihood 
regions in the CMSSM parameter space making the resultant profile likelihood functions highly 
inaccurate. They further questioned the accuracy of posterior distributions obtained with Multi- 
Nest. They went on to argue that MultiNest might not be able to find these high likelihood 
regions even if its termination criterion is adjusted for finding the profile likelihoods. One of the 
aims of this paper is to show that these shortcomings can be avoided through proper configuration 
of the MultiNest. 

The outline of this paper is as follows. In Sec. ^ we first give an introduction to the Bayesian 
and frequentist frameworks for statistical analysis. We briefly describe the MultiNest algorithm 
and how it can be tuned for evaluating profile likelihoods in Sec. |[ In Sec. |^ we use two relevant 
toy problems to study the ability of MultiNest to reconstruct both the posterior and profile likeli- 
hood. We then obtain the profile likehhood function of CMSSM parameters using MultiNest in 
Sec. H and compare our results with the GA method adopted in [26]. We also discuss implications 
for direct and indirect detection prospects. Finally we present our conclusions in Sec. |6[ 



2. Statistical Framework 
2.1 Bayesian Modelling 

We briefly recall some basics about Bayesian inference, referring the reader to e.g. [27] for further 
details. Bayesian inference is based on Bayes' theorem, stating that 

P(eiD) ^ MS). (2.) 

where P(0 [D) is the posterior probability distribution of the parameters given data D, ^(D 1 0) e 
£(0) is the likelihood function, P(0) is the prior, and P(D) = Z is the Bayesian evidence (or 
model likelihood). The latter is a normalization constant, obtained by averaging the likelihood over 
the prior: 

Z = y" £.{@)P{Q)d&. (2.2) 

In Bayesian statistics, the A^-dimensional posterior distribution described in Eq. ( p| ) constitutes 
the complete Bayesian inference of the parameter values. The inference about an individual param- 
eter 6i is then given by the marginalised posterior distribution P(0i[D) which can be obtained by 
integrating (marginalizing) the A^-dimensional posterior distribution over all the parameters apart 
from 9i. If samples from the full posterior are available (having been generated for example via 
Markov Chain Monte Carlo sampling), then the above integral is replaced by a simple counting 
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procedure: since posterior samples are distributed according to the posterior, the marginal pos- 
terior for 6i above is obtained from the full posterior by dividing the range of in a number of 
bins and counting how many posterior samples fall in each bin. A 1-D marginal posterior interval 
corresponding to a symmetric credible region containing 1 — a of posterior probability for Oi is 
delimited by an interval 6^] such that: 



oo 



P(0i|D)d0i = a/2 and / P{9i\D)dei = a/2. (2.3) 

D Je+ 

2.2 Profile Likelihood-Based Frequentist Analysis 

In the classical or frequentist school of statistics, one defines the probability of an event as the limit 
of its frequency in a large number of trials. Classical confidence intervals based on the Neyman 
construction are defined as the set of parameter points in which some real-valued function, or test 
statistic, t evaluated on the data falls in an acceptance region Wq = If the boundaries 

of the acceptance regions are such that P{t £ W& | 0) > 1 — a is satisfied, then the resulting 
intervals will cover the true value of with a probability of at least 1 — a. Likelihood ratios are 
often chosen as the test statistic on which frequentist intervals are based. When is composed 
of parameters of interest, 6, and nuisance parameters, ijj, a common choice of test statistic is the 
profile Ukelihood ratio 

X{e) (2.4) 

where ip is the conditional maximum likelihood estimate (MLE) of ip with 9 fixed and 9,ip are 
the unconditional MLEs. Under certain regularity conditions, Wilks showed that the distribution 
of —2 In A (6*) converges to a chi-square distribution with a number of degrees of freedom given by 
the dimensionality of 9 [28]. Thus, by assuming that the asymptotic distribution is a good approx- 
imation of the finite sample case, one can trivially define the acceptance region W@ from standard 
lookup tables. One must keep in mind that these intervals based on the asymptotic properties of 
the profile likelihood ratio require certain regularity conditions to be fulfilled and are not guaran- 
teed to have strict coverage. In particular, in cases with complex multimodal likelihoods one might 
reasonably suspect that the distribution of —2 In X{9) is still far from converging to its asymptotic 
form (see [25] for an illustration). While there exist modifications to the profile likelihood ratio 
that converge more rapidly [29-31], within particle physics the profile likelihood ratio is a standard 
technique^ . 

One can approximate the X{9) using an arbitrary sampling of the likelihood function as long 
as one has access to the value of the likelihood function at the sampled points. The procedure is 
simple: one groups the samples in bins of 9 and for each bin searches for the maximum likelihood 
sample in that bin, which corresponds to C{9, ip). 



It can be seen from Eq. ( |2.4| ) that the profile likelihood function requires the conditional MLEs 
V'(6') to be found for each value of 9 as well as the unconditional MLE {9,'ip), which represents 
the global maximum likelihood point. If the unconditional MLE is not properly resolved, this will 



"In the following, we will refer to the profile likelihood ratio as "profile likelihood" or "likelihood function" for 
brevity. 
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effectively result in a different threshold on —2 In A(^) corresponding to a wrong confidence level. 
Even if the unconditional MLE is found, we might also worry how well the conditional MLEs ■>p{9) 
are found, particularly when C{6, Tp) <C C{9, ip). 



3. Using MultiNest for Profile Likelihood Exploration 

Nested sampling [9] is a Monte Carlo method whose primary aim is the efficient calculation of 



the Bayesian evidence given by Eq. (2.2) As a by-product, the algorithm also produces posterior 



samples which can be used to map out the posterior distribution of Eq. ( |2. 1[ ). Those same samples 
have also been used to estimate the profile likelihood, a point to which we return below in greater 
detail. Nested sampling calculates the evidence by transforming the multi-dimensional evidence 
integral into a one-dimensional integral that is easy to evaluate numerically. This is accomplished 
by defining the prior volume X as dX = P{@)d@, so that 



X(A) = / P{@)dG, (3.1) 

where the integral extends over the region(s) of parameter space contained within the iso-likelihood 



contour C{@) = A. The evidence integral, Eq. (2.2), can then be written as: 



1 

C{X)dX, (3.2) 



where C{X), the inverse of Eq. ( p.l[ ), is a monotonically decreasing function of X. Thus, if one 
can evaluate the likelihoods = C{Xi), where Xi is a sequence of decreasing values, 

< Xm < • • • < ^2 < Xi < Xo = 1, (3.3) 

the evidence can be approximated numerically using standard quadrature methods as a weighted 
sum 

M 

Z = J2^iW,, (3.4) 

1=1 

where the weights Wi for the simple trapezium rule are given by Wi = |(A'j_i — Xj+i). 

The summation in Eq. ( |3.4| ) is performed as follows. The iteration counter is first set to i = 
and niive 'active' (or 'live') samples are drawn from the full prior P{@), so the initial prior volume 
is Xq = 1. The samples are then sorted in order of their likelihood and the smallest (with likelihood 
Cq) is removed from the active set (hence becoming 'inactive') and replaced by a point drawn from 
the prior subject to the constraint that the point has a likelihood C > Cq. The corresponding 
prior volume contained within the iso-likelihood contour associated with the new live point will 
be a random variable given by Xi = tiX^, where ti follows the distribution P{t) = Nt^~^ 
(i.e., the probabiUty distribution for the largest of N samples drawn uniformly from the interval 
[0, 1]). At each subsequent iteration i, the removal of the lowest likelihood point Ci in the active 
set, the drawing of a replacement with L> Li and the reduction of the corresponding prior volume 
Xi = tiXi^i are repeated, until the entire prior volume has been traversed. The algorithm thus 
travels through nested shells of likelihood as the prior volume is reduced. 
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The nested sampling algorithm is terminated when the evidence has been computed to a pre- 
specified precision. The evidence that could be contributed by the remaining live points is estimated 
as AZi = Cyaa.xXi, where £max is the maximum-likelihood value of the remaining live points, and 
Xi is the remaining prior volume. The algorithm terminates when AZ^ is less than a user-defined 
value. 

The most challenging task in implementing nested sampling is to draw samples from the prior 
within the hard constraint C > Q at each iteration i. The MultiNest algorithm [23, 24] tackles 
this problem through an ellipsoidal rejection sampling scheme. The live point set is enclosed 
within a set of (possibly overlapping) ellipsoids and a new point is then drawn uniformly from the 
region enclosed by these ellipsoids. The ellipsoidal decomposition of the live point set is chosen 
to minimize the sum of volumes of the ellipsoids. The ellipsoidal decomposition is well suited to 
dealing with posteriors that have curving degeneracies, and allows mode identification in multi- 
modal posteriors. If there are subsets of the ellipsoid set that do not overlap with the remaining 
elhpsoids, these are identified as a distinct mode and subsequently evolved independently. 

The two most important parameters that control the parameter space exploration in nested sam- 
pUng are the number of live points nuve - which determines the resolution at which the parameter 
space is explored - and a tolerance parameter tol, which defines the termination criterion based 
on the accuracy of the evidence. As discussed in the previous section, evaluating profile likeli- 
hoods is much more challenging than evaluating posterior distributions. Therefore, one should not 
expect that a vanilla setup for MultiNest (which is adequate for an accurate exploration of the 
posterior distribution) will automatically be optimal for profile likelihoods evaluation. Generally, a 
larger number of hve points is necessary to explore profile likelihoods accurately. Moreover, set- 
ting tol to a smaller value results in MultiNest gathering a larger number of samples in the high 
likelihood regions (as termination is delayed). This is usually not necessary for the posterior distri- 
butions, as the prior volume occupied by high likelihood regions is usually very small and therefore 
these regions have relatively small probability mass. For profile likelihoods, however, getting as 
close to the true global maximum is crucial and therefore one should set tol to a relatively smaller 
value. We now investigate these issues in the context of two relevant toy problems. 

4. Application to Toy Models 

In order to demonstrate the differences between profile likelihoods and the posterior distributions 
reconstruction with MultiNest, we consider two 8-D toy problems in this section. The dimen- 
sionality of these toy problems is chosen to correspond to the dimensionality of CMSSM analyses 
in which 4 CMSSM parameters are varied along with 4 Standard Model (SM) parameters (see 



The first problem we consider is a uni-modal 8-D isotropic Gaussian with the following like- 
lihood function: 



with a = 0.05 and /ij = 0.5 (j = 1, . . . , 8). We assume uniform priors ZY(0, 1) for all 8 parameters. 
The analytical posterior distributions and profile likelihood functions are shown in Fig. |l| for a 



Sec. |). 




j=i 



(4.1) 
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(a) (b) 

Figure 1 : Analytical 1 -D and 2-D (a) marginalized posterior distributions and (b) profile likelihood functions 
for the unimodal 8-D Gaussian problem. The contours represent the 68% and 95% Bayesian credible regions 
in (a) and profile likelihood confidence intervals in (b) respectively. 



subset of the parameters. As expected, the posterior and profile likelihood distributions are identical 
in this case. 

We now reconstruct both profile likelihood and posterior distribution with MultiNest, using 
niive = 1000 with tol = 0.5 (which is expected to be adequate for posterior reconstruction) and 1 x 
10"'^ (a much lower value targeted at profile likelihood reconstruction). The posterior distributions 
and profile likelihoods recovered by MultiNest for the first three parameters are shown in Figs. ^ 
and ^ respectively. It is evident from these plots that recovered posterior distributions for both 
tol = 0.5 and 1 x 10"'^ are almost identical to the analytical posterior more than 3a into the tails. 
However, the recovered profile likelihood for tol = 0.5 is quite noisy and inaccurate, especially 
around the peak. This is because MultiNest has not explored the high likelihood region in 
great detail, as the algorithm is terminated after about 163,000 likelihood evaluations, which is 
adequate to calculate the evidence to sufficient accuracy. We can circumvent this problem by setting 
tol = 1 X 10~^ (dotted blue curves), which results in a larger number of likelihood evaluations, 
around 261,000. The situation around the peak is now greatly improved, although the tails of the 
profile likelihood are still not explored to very high accuracy beyond the ~ 3a region. Nevertheless, 
the overall profile likelihood has been recovered with sufficient accuracy to allow for a reasonably 
correct parameter estimation. 

The second problem we consider is a multi-modal 8-D Gaussian mixture model with the fol- 
lowing likelihood function: 



(2vr)4 nj 



■ exp 



1 ^ 

-y 

j=i 



(^i-Wj)' 



(4.2) 
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Figure 2: 1-D posterior distributions for the first 3 parameters of the uni-modal 8-D Gaussian problem. 
The red curve shows the analytical posterior distribution while the blue dashed and grey dotted curves show 
the posterior distributions recovered by MultiNest with tol set to 0.5 and 1 x 10~^ respectively. 1000 
live points were used in both cases. The upper and lower panels show the posterior and In of posterior 
distributions respectively. 



with the following values for the parameters defining the likelihood function: 

Mi:wi = 0.98, cTii = 0.05 and /^n = 0.5 (1 < i < 8); (4.3) 

M2:w2 = 0.01, o-2i = 0.02 {1 < i < 8), ^21 = 0.55, ^2i = 0.5 (2 < i < 8); (4.4) 
A^s : W3 = 0.01, 0-31 = 0.004, f732 = 0.1, o-3i = 0.02 (3 < i < 8), 

/i3i = 0.3,/i3i = 0.5(2<i<8). (4.5) 

We assume uniform priors ^(0,1) for all 8 parameters. This describes a multi-modal distribution 
with the broad Gaussian of the previous example centered in the middle of the parameter space. 
A narrow isotropic Gaussian lies slightly to the right of the broad Gaussian in 9i while another 
narrow Gaussian lies to the left. We denote these three modes by Aii, M.2 and respectively. 
A^3 has its variances in the first two dimensions chosen in such a way to mimic the funnel region 
at low mi 12 values in the CMSSM model. The weights of the mixture model are set such that the 
M.I occupies 98% of the posterior probability mass while M.2 and A^3 collectively occupy only 
2% of the probability mass and therefore we would expect the posterior probability distribution to 
be dominated by A^i. The analytical 1-D and 2-D posterior distributions and profile likelihood 
functions are shown in Fig. ^ (only distributions of the first 3 parameters are shown, as parameters 
04 onwards behave exactly as ^3). As expected, the posterior distributions are dominated almost 
entirely hy M.i, the mode with the largest posterior probability mass. M.^, barely registers in the 
posterior distribution as the secondary peak in the 61 direction, while 7W2 is completely masked by 
Ml. 

The profile likelihood, on the other hand, is dominated by the narrow, highly-peaked modes 
M2 and 7W3, whose peak likelihood values are 9.25 and 9.61 times higher, respectively, than the 
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Figure 3: 1-D profile likelihood functions for the first 3 parameters of the uni-modal 8-D Gaussian problem. 
The red curve shows the analytical profile hkehhood functions while the blue dashed and grey dotted curves 
show the profile UkeUhoods recovered by MultiNest with tol set to 0.5 and 1 x 10"'' respectively. 1000 
live points were used in both cases. The upper and lower panels show the profile likelihood and —2 In of 
profile likelihood functions respectively. Green and magenta horizontal lines represent the la and 2a profile 
likelihood intervals respectively. 



likelihood value at the peak of A^i. This is indeed reflected in the right panels of Fig. ^ The 
1-D profile likelihood for 9i is bi-modal with M2 and M.^ constituting the two modes. 7^3 only 
produces a heavier, non-Gaussian tail extending downwards from 61 = 0.55 (where the maximum 
likelihood peak of M.2 is located). Looking instead at the profile likelihood for 62, the (very 
subdominant) contribution from M.i barely shows up as a small extra peak on top of the broader 
peak from at 62 = 0.5. The profile likelihood for 62 consists almost entirely of M.^ as its 
standard deviation in 6*2 is 5 times larger than the one of M.2 while the likelihood value at the peak 
of 3 is only slightly smaller than the value for M.2- We notice that confidence regions for the 
profile likelihood plotted in the 2-D panels are only approximately correct, as we have assumed 
that the likelihood distribution follows a multivariate Gaussian distribution in order to derive the 
confidence limits as discussed in Sec. [2.2[ , which is obviously not correct for this particular toy 
model. 

These dramatic differences in the posterior and profile likelihood functions make this problem 
an extremely challenging one for any numerical inference algorithm to tackle. Even just finding 
M2 and 7W3 is an extremely difficult task as they are essentially spikes in the probability distribu- 
tion. We show the recovered 1-D posterior distributions and profile likelihood with MultiNest 
in Figs. ^ and ^ respectively. With nii^e = 4, 000 and tol = 0.5 (resulting in about 725,000 likeli- 
hood evaluations), MultiNest finds all three Gaussians and the recovered posterior distributions 
are almost identical to the analytical posterior distributions. However, the profile likelihood func- 
tions are highly inaccurate and very noisy. This is because the high likelihood regions have not 
been explored in great detail as the sampling proceeds according to the posterior mass, which is 
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(a) (b) 

Figure 4: 1-D and 2-D (a) marginalized posterior distributions and (b) profile likelihood functions for the 
8-D Gaussian mixture problem. The contours represent the 68% and 95% Bayesian credible regions in (a) 
and profile likelihood confidence intervals in (b) respectively. The plots were generated after dividing the 
analytical posterior values in 200 bins so that a fair comparison can be made with the distributions obtained 
using MultiNest. 



heavily concentrated in A^i. Therefore, the modes A^2 and A^3 are explored with a proportionally 
lower number of live points, as their posterior mass is small. 

This problem can be solved by running MultiNest with nn^c = 20, 000 and decreasing tol 
to 1 X 10^^ which results in around 1 1 million likelihood evaluations. Increasing the number of 
live points ensures that A^3 encloses more live points and is explored at higher resolution. As 
can be seen from Fig. ^ the recovered profile likelihood functions are much more accurate and 
less noisy with this setup. The 1-D profile hkelihoods for 9i and ^3 are almost identical to the 
analytical ones, but we can see some under-sampling of the profile likelihood for 62- Despite this, 
just finding AI3 is already an extremely challenging task for any numerical technique, including 
GA, and although the recovered profile likelihood functions by MultiNest are not perfect, they 
are reasonably accurate, certainly within the 2a confidence region. 

5. Application to the CMSSM 

The Minimal Supersymmetric Standard Model (MSSM) [32, 33] with R-parity can solve the hier- 
archy problem and provide a candidate dark matter (DM) particle. The MSSM with one particular 
choice of universal boundary conditions at the grand unification scale, is called the Constrained 
Minimal Supersymmetric Standard Model (CMSSM) [34]. In the CMSSM, the scalar mass mo, 
gaugino mass mi/2 and tri-linear coupling Aq are assumed to be universal at a gauge unification 
scale Mgut ~ 2 x 10^^ GeV. In addition, at the electroweak scale one selects tan /3, the ratio 
of Higgs vacuum expectation values and sign(^), where /i is the Higgs/higgsino mass parameter 
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Figure 5: 1-D posterior distributions for the first 3 parameters of the 8-D Gaussian mixture problem. The 
red curve shows the analytical posterior distribution while the blue dashed and grey dotted curves show 
the posterior distributions recovered by MultiNest with nuvc = 4, 000, tol = 0.5 and nuvc — 20, 000, 
tol = 1 X 10^* respectively. The upper and lower panels show the posterior and In of posterior distributions 
respectively. 



whose square is computed from the potential minimisation conditions of electroweak symmetry 
breaking (EWSB) and the empirical value of the mass of the boson, Mz- The family uni- 
versality assumption is well motivated since flavour changing neutral currents are observed to be 
rare. Indeed several string models (see, for example Ref. [35, 36]) predict approximate MSSM 
universality in the soft terms. 

The CMSSM has proved to be a popular choice for SUSY phenomenology because of the 
small number of free parameters and it has recently been studied quite extensively in multi- 
parameter scans, both from a frequentist [4, 5, 37] and a Bayesian perspective [6-8, 14, 15, 22, 26, 
38]. Bayesian and frequentist analyses are expected to give consistent inferences for the CMSSM 
parameters if the data are sufficiently constraining. However, several groups have concluded that 
we do not yet have enough constraining power in the available data to overridethe influence of 
priors in the Bayesian framework. Furthermore, the structure of the parameter space is such that 
the likelihood presents "spikes" that contain little posterior mass (for reasonable choice of priors). 
As a consequence, regions of high likelihood (favoured in a frequentist analysis) do not necessarily 
match regions of large posterior mass (favoured under a Bayesian analysis), see e.g. [14]. There- 
fore, Bayesian and frequentist inferences should not be expected to yield quantitatively consistent 
results. Both hurdles are expected to be overcome once ATLAS results become available, see [39], 
although the actual performance of different statistical approaches, as measured by their coverage 
properties, has only just begun to be scrutinized [25, 40]. 

Here we focus on the algorithmic problem of obtaining reliable posterior probability and pro- 
file hkelihood maps with MultiNest. The authors of Ref. [26] claimed that the GA method 
produces more accurate profile likelihood functions than MultiNest. They also raised questions 
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Figure 6: 1-D profile likelihood functions for the first 3 parameters of the 8-D Gaussian mixture problem. 
The red curve shows the analytical profile likelihood function while the blue dashed and grey dotted curves 
show the profile UkeUhoods recovered by MultiNest with nuvo = 4, 000, tol = 0.5 and nuvc = 20, 000, 
tol = 1 X lO""* respectively. The upper and lower panels show the profile likelihood and —2 In of pro- 
file likelihood functions respectively. Green and magenta horizontal lines represent the la and 2a profile 
likelihood intervals respectively. 



about the accuracy of the posterior distributions obtained with MultiNest. The previous section 
demonstrates that the posterior distributions from MultiNest are highly reliable, as they match 
very well the analytical solution for both toy problems out to more than 3a in the tails. We stress 
that this is the case even for the less computationally intensive MultiNest configuration, with 
niive = 1000 and tol = 0.5. 

We now reconstruct both the posterior distribution and the profile likelihood for the CMSSM 
parameters using MultiNest, as implemented in the SuperBayeS vl . 5 code in almost ex- 
actly the same setup employed in Ref. [26]. The set of CMSSM and SM parameters together con- 
stitute an 8-dimensional parameter space with =(mo, rni^2^ tan/3, mt, mii{mf,)'^'^^ , a^^^, 

(Mz)) to be scanned and constrained with the presently available experimental data. The 
ranges over which we explore these parameters are tuq, mi/2 G (0.05, 4) TeV, Aq € (—4, 4) TeV, 
tan/3 G (2, 65), nit G (167.0, 178.2) GeV, mf,(mfc)^ G (3.92, 4.48) GeV, l/a^{Mz) G 
(127.835, 128.075) and a^{Mz) G (0.1096, 0.1256). We use a slightly wider range for tan /3 
and a smaller range for Aq which was allowed to vary between —7 TeV and 7 TeV in [26] as there 
is very little probability mass as well as very few points with high likelihood with \Aq\ > 4 TeV. 
The observables used in our analysis are exactly the same as given in Table 1 of Ref. [26], in order 
to allow a comparison. 

As discussed in Sees. |3| and 0, a larger uh^q and smaller tol should be used for profile likelihood 
analyses with MultiNest. We therefore used nii^e = 20,000 and tol = 1 x 10"^. Flat priors 
were imposed on all 8 parameters with parameter ranges described above. This resulted in about 5.5 
million likelihood evaluations (compared to 3 million likelihood evaluations for the GA method) 
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Figure 7: 2-D (a) marginalized posterior distributions and (b) profile likelihoods for the CMSSM parameters 
using all presently available constraints. The contours represent the 68% and 95% Bayesian credible regions 
in (a) and profile likelihood confidence intervals in (b). Flat priors with 20,000 live points were used with 
tol set to 1 X 10^'* (no smoothing applied to the plots). The mean and best-fit parameters values are shown 
by a black cross and a green circle, respectively. 



taking 6 days on 10 2.4GHz CPUs and returned an evidence value of log(^) = —18.62 it 0.08. 
The resultant 2-D marginalized posterior distributions and profile likelihoods in the mi/2 — 
and tan /3 — mo planes are shown in Fig. ^. By comparing the posterior distribution in the upper 
left panel of Fig. ^ with the upper left panel of Fig. 13 in [14], obtained with the same setup but 
with niive = 4, 000 and tol = 0.5, it is apparent that the posterior is stable with respect to increas- 
ing the number of samples (Fig. uses ~ 20 times more samples than the corresponding figure 
in Ref. [14]). Indeed, the evidence value obtained with this setup is log{Z) = —18.28 ± 0.14. 
The difference in log evidence between the two setups is thus Alog(2) = —0.34 ± 0.16. For 
the log prior scans, the difference in log evidence between the two configurations is even lower. 




Figure 8: 1-D profile likelihoods for the CMSSM parameters normalized to the global best-fit point. The 
red solid and blue dotted vertical lines represent the global best-fit point (x^ = 9.26, located in the focus 
point region) and the best-fit point found in the stau co-annihilation region (x^ — 11.38) respectively. The 
upper and lower panel show the profile likelihood and Ax'^ values, respectively. Green (magenta) horizontal 
lines represent the la {2a) approximate confidence intervals. MultiNest was run with flat priors, 20,000 
live points and tol = 1 x 10^**. 



Alog(-E) = —0.08 ± 0.14. As this systematic difference is comparable with the statistical 
uncertainty, we can conclude that the posterior mass which is being missed by the standard, 
Bayesian setup is negligible (especially so given that the systematic difference is much less than 
the typical difference in thresholds on the Jeffreys' scale for the strength of evidence, which are 
at Alog(Z) = 1.0, 2.5, 5.0, see e.g. [27]). This leads us to reject the speculation in Ref. [26] re- 
garding the validity of posterior probability distributions obtained with the standard configuration 
of MultiNest. 

The 2-D profile likelihood plots (bottom panels in Fig. ^ should be compared with Fig. 1(a) 
in Ref. [26], which were obtained with the GA^. Our best-fit point (i.e, the unconditional MLE) 
has = 9.26 and lies in the focus point region. Therefore, la and 2a regions are approximately 
dehmited by < 11.56 and < 15.43, respectively. The MultiNest best-fit point has a 
slightly better value than the GA best-fit point (x^ = 9.35). However, MultiNest la and 
2a regions are much larger than the corresponding GA regions. This is because MultiNest has 
been able to map out more accurately the regions surrounding the best-fit than GA. By comparing 
2-D posterior and profile likelihood plots in Fig. ^, we notice that most of the la profile likelihood 
interval in the focus point region of the mi/2 — mo lies outside of the corresponding 2a Bayesian 
credible interval. This is a clear sign that the high likelihood region in the focus point region is 
spike-like. These regions are not well explored by MultiNest with the standard configuration 
(tol = 0.5). If one is interested in mapping out the posterior, however, this is fully justified, as 
these regions contribute almost nothing to the Bayesian evidence or the posterior probability. If 
however one wants to use MultiNest for profile likelihood intervals, then it is imperative to map 

^^We have verified that reducing nuvc = 4, 000, while keeping tol = 1 x 10^^, leads to profile likelihoods very 
similar to the ones shown in Fig. ^, albeit slightly more noisy. We thus do not show those results, but we remark that 
this configuration reduces the computational effort by a factor of ~ 3 compared with the case where nuvc = 20, 000, 
without degrading the profile likelihood results appreciably. 
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Figure 9: 2-D profile Ukelihood functions for (a) log priors and (b) flat+log merged chains for CMSSM 
parameters. The contours represent the 68% and 95% profile likelihood confidence intervals. 20,000 live 
points were used with tol set to 1 x 10^''. The mean and best-fit parameters values are shown by black cross 
and green circle respectively. 



out those regions in much greater detail. 

The values of global best-fit point and the best-fit point in stau co-annihilation region (COA) 
found by MultiNest are listed in Tab. [l| and have similar parameter values to ones found by GA. 
Fig. shows the 1-D profile likelihood. This figure should be compared with the corresponding 
1-D profile likelihoods obtained with the GA method in Fig. 3 of Ref. [26]. 

In principle, the profile likelihood does not depend on the choice of priors. However, in order 
to explore the parameter space using any Monte Carlo technique, a set of priors needs to be defined. 
Any numerical sampling technique effectively defines a metric on the parameter space of interest, 
and different choices of this metric will generally lead to different regions of the parameter space 
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Figure 10: 1-D profile likelihoods for the CMSSM parameters for flat+log merged chains, normalized to 
the global best-fit point. The red solid and blue dotted vertical lines represent the global best-fit point (x^ ~ 
9.26, located in the focus point region) and the best-fit point found in the stau co-annihilation region (x^ — 
11.38) respectively. The upper and lower panel show the profile likelihood and Ax^ values, respectively. 
Green (magenta) horizontal lines represent the la (2(7) approximate confidence intervals. MultiNest was 
run with flat priors, 20,000 live points and tol = 1 x 10^"*. 





Global BFP 


COA BFP 




(located in FP region) 




Model and Nuisance Parameters 


mo 


1748.7 GeV 


159.0 GeV 


■mi/2 


334.1 GeV 


411.4 GeV 


Ao 


1949.5 GeV 


946.2 GeV 


tan P 


55.7 


19.53 


mt 


173.0 GeV 


173.3 GeV 




4.19 GeV 


4.20 GeV 




0.1178 


0.1182 


1/q'^^^(Mz) 


127.952 


127.959 


Observables 


mw 


80.367 GeV 


80.371 GeV 




0.23156 


0.23153 


ia^'^^^ X 10i« 


6.8 


13.7 


BR(B^ Xs-y) X lO"* 


3.45 


2.95 




17.31 ps"^ 


19.0 ps^^ 


BR(Bu TU) X lO'' 


1.38 


1.46 


0.11408 


0.11038 


BR(Bs fi+fi-) 


4.08 X 10"** 


3.89 X 10"** 



Table 1: Best-fit parameter and observable values found from flat+log merged chains. These quantities are shown for 
both the global best-fit point, located in the focus point (FP) region, as well as the best-fit point in the stau co-annihilation 
(COA) region. 

to be explored in greater or lesser detail. Therefore, different choice of priors might lead to slightly 
different profile likelihoods. Uniform priors in log(mo) and log(mx/2) (called log priors) are often 
employed in Bayesian analyses of CMSSM as the masses are scale parameters. In order to get 
a feeling for the robustness of our profile likelihoods, we show the 2-D profile likelihoods from a 
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scan using log priors in Fig. ^a). As expected, the parameter space with lower mo and mi/2 values 
has been explored more thoroughly. The best-fit point found has = 11-44 and lies in the stau 
co-annihilation region (COA), thus this scan with log priors has missed a a higher likelihood point 
in the focus point region. Conversely, the scan with flat priors has explored the COA region in 
similar detail. Thus, while these tunings of MultiNest are clearly superior for profile likelihood 
analysis than the default settings for Bayesian analysis, there remains sensitivity to the choice of 
priors used in the scans as previously seen in the literature. 

We can obtain more robust profile likelihoods by simply merging the flat and log prior scans, 
the performing the profiling over the joint set of samples (encompassing about 10.85 million 
points). This does not come at a greater computational cost given that a responsible Bayesian 
analyses would estimate sensitivity to the choice of prior as well. The resulting 2-D and 1-D pro- 
file likelihoods are shown in Figs, ^(b) and [T^ respectively. The most obvious difference between 
the merged and flat profile likelihood distributions is the appearance of the funnel region at low 
mi/2 values which was not explored thoroughly with flat priors because of the its smaller prob- 
ability mass. Apart from this, it is clear that the merged profile likehhood distributions are quite 
similar to the distributions obtained with flat prior (see Fig. ^(b)). 

In the left panel of Fig. 11, we show the 2-D profile likelihood functions spin-independent 
scattering cross-section of the neutralino and a proton cjp^ versus the neutralino mass along 
with the latest experimental exclusion limits at the 90% confidence level from CDMS-II [41] and 
XENON 100 [42], derived under standard halo assumptions (and in particular, assuming a local 
DM density = 0.3 GeV/cm^). Our global best-fit point has quite a large cross-section (1.85 x 
10^^ pb) and almost the entire la region in the FP region would already be excluded by current 
experimental limits if one takes them at face value. However, there are several uncertainties that 
need to be accounted for in order to achieve a robust exclusion: astrophysical uncertainties in 
the local DM density and velocity distribution [43-45], systematic uncertainties related to our 
position in the Milky Way halo [46], dumpiness of the DM distribution and impact of baryonic 
physics [47], hadronic matrix elements uncertainties [48]. Taken together, all those uncertainties 
can easily exceed an order of magnitude. For this reason, we have chosen not to include the current 
direct detection limits in the likelihood function, although we notice qualitatively that the FP best- 
fit point is coming under pressure from such limits. On the other hand, the best-fit a^^ in the COA 
region is relatively small (2.02 x 10^^ pb) and is well below the current experimental limits. 

Regarding indirect detection prospects, we plot the 2-D profile likelihood functions for the 
velocity-averaged neutralino self-annihilation cross-section (av) against the neutralino mass 
in the right panel of Fig. 11 . Although there is a small region of the parameter space around 300 
GeV < < 500 GeV and -27.5 < log^o {o'v) < -26.5 with high likelihood points found by 
GA in [26] and missed by MultiNest, in general we have mapped the profile likelihood more 
thoroughly as can be seen by comparing the right panel of Fig. 11 with Fig. 6(a) in [26]. Our 
global best-fit point has (av) = 2.29 x lO^^^cm'^ s~^ which is very similar to the global best-fit 
point found in [26] and therefore we reach the same conclusion, namely that it might be possible to 
cover part of this high-likelihood FP region in the future with the Large Area Telescope (LAT)[49] 
aboard the Fermi gamma-ray space telescope. 
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Figure 11: 2-D profile likelihood functions for flat+log merged chains for spin-independent scattering cross- 
section of the neutralino and a proton cTp^ versus the neutralino mass (left panel) and velocity-averaged 
neutralino self-annihilation cross-section (av) versus the neutralino mass (right panel). The contours 
represent the 68% and 95% profile likelihood confidence intervals. 20,000 live points were used with tol set 
to 1 X 10~^. The best-fit parameters values are shown by green circle. The latest experimental exclusion 
hmits at the 90% confidence level from CDMS-II [41] and XENONIOO [42] are also shown in the left panel. 





Global BFP 


COA BFP 




(located in FP region) 






136.34 GeV 


156.46 GeV 


Direct Detection 




1.85 X 10"'' pb 


2.02 X 10"" pb 




1.96 X 10"*^ pb 


4.05 X 10"" pb 




1.34 X 10"*^ pb 


3.00 X 10"" pb 


Indirect Detection 


(av) 


2.29 X 10"^^ cm^ s"^ 


4.89 X 10"^* cm^ s'^ 



Table 2: Best-fit neutralino mass and dark matter direct and in- 
direct detection observables. These quantities are shown for both 
the global best-fit point, located in the focus point (FP) region, as 
well as the best-fit point in the stau co-annihilation (COA) region. 



6. Conclusions 

As the LHC impinges on the most antic- 
ipated regions of SUSY parameter space, 
the need for statistical techniques that will 
be able to cope with the complexity of 
SUSY phenomenology is greater than ever. 
Early on, the data will not be strong enough 
to dominate the impact of priors on Bayesian 
inference; thus, the field is preparing to 
present the complementary information pro- 
vided by Bayesian techniques and the tra- 
ditional results from the profile likelihood. 



In this paper we have shown that, when configured appropriately, MultiNest can be suc- 
cesfully employed for approximating the profile likelihood functions, even though it was primarily 
designed for Bayesian analyses. In particular, we have demonstrated that it is important to use a 
termination criterion that allows MultiNest to explore high-likelihood regions. We have also 
shown that the likelihood spikes which play a key role in the profile likelihood and are missed by 
MultiNest when it is run in Bayesian analysis mode, have no bearing on the posterior distribu- 
tion. Despite these significant improvements, we do find that the prior used in the MultiNest 
scan can influence the profile likelihood ratio, particularly if the global maximum of the likeli- 
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hood is spiky and in a region with extremely low prior probability. This can be partially abated by 
merging samples from scans with different priors, which are often available from studies of prior 
dependence in the corresponding Bayesian results. 

We compared our results and conclusions to those reported in Ref. [26], which used genetic 
algorithms for mapping the profile Ukelihood functions. The authors of that study concluded that 
their sampling algorithm resulted in better approximations to the profile likelihood than those ob- 
tained with samples from Bayesian techniques like MCMC and MultiNest. They also speculated 
that MultiNest might not be sufficiently accurate even for Bayesian analyses. Our results indi- 
cate that when run appropriately, MultiNest provides a significantly better approximation to the 
profile hkehhood than the genetic algorithm method presented in Ref. [26], particularly in the ex- 
ploration of the parameter space near the boundaries of the interval, albeit at a slightly increased 
computational cost. There are also indications that MultiNest might be more reliable in identify- 
ing the overall best-fit point than the genetic algorithm, as the latter only found the overall best-fit 
in 1 out of 10 runs (see Fig. 8 in Ref. [26]). 

We end by speculating that the ideal algorithm for profile likelihood based analyses in com- 
plex, multi-modal problems such as supersymmetric parameter spaces may require a hybrid of 
global scans like MCMC and MultiNest together with dedicated maximization packages like 
MiNUlT [50]. The challenge for most maximization packages - whether they are based on gradi- 
ent descent, conjugate gradient descent, or the EM-algorithm - is that they can get stuck in local 
maxima. Hence, it seems clear that a more global scan will be required. However, once the scan- 
ning algorithm has located a basin of attraction the maximization packages are hkely to be the best 
tool for refining the local maxima. As we discussed here, MultiNest with a low tolerance acts 
as a maximizer, and can be subject to similar problems with local maxima. The nested structure 
provided by the nested sampling algorithm may provide an efficient and practical way for defining 
those basins of attraction, reducing the number of points that will initiate a dedicated maximiza- 
tion. In future work we wish to investigate these hybrid approaches and compare Minuit and 
MultiNest for this dedicated maximization step. 
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